Skip to content

FEAT: Jailbreak Scenario Expansion#1340

Open
ValbuenaVC wants to merge 64 commits intoAzure:mainfrom
ValbuenaVC:jailbreak2
Open

FEAT: Jailbreak Scenario Expansion#1340
ValbuenaVC wants to merge 64 commits intoAzure:mainfrom
ValbuenaVC:jailbreak2

Conversation

@ValbuenaVC
Copy link
Contributor

@ValbuenaVC ValbuenaVC commented Jan 30, 2026

Description

Adding more features to the Jailbreak scenario! Major changes:

  • JailbreakStrategy now supports multiple different attack types via ManyShot, PromptSending, Crescendo, and RedTeaming values.
  • New attack strategies can be collected using SINGLE_TURN and MULTI_TURN aggregates; PYRIT has been deprecated.
  • The initializer now accepts k_jailbreaks, num_tries, and jailbreak_names; these allow you to choose a random number of jailbreaks, how many times to try each jailbreak, and to choose which jailbreaks specifically you'd like to use respectively. Note that k_jailbreaks and jailbreak_names are mutually exclusive.
  • A default adversarial target has been added to support the relevant attack strategies.

Tests and Documentation

  • Expanded to support new strategies.

@ValbuenaVC ValbuenaVC changed the title Jailbreak Scenario Expansion [DRAFT] Jailbreak Scenario Expansion Feb 5, 2026
@ValbuenaVC ValbuenaVC changed the title [DRAFT] Jailbreak Scenario Expansion [DRAFT] FEAT: Jailbreak Scenario Expansion Feb 5, 2026
DatasetConfiguration: Configuration with airt_harms dataset.
"""
return DatasetConfiguration(dataset_names=["airt_harms"], max_dataset_size=4)
return DatasetConfiguration(dataset_names=["airt_harms"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for removing max dataset size? I think we have this set so that our integration tests don't run the entire dataset by default, which would slow it down.

Is there anywhere where the user provides a max prompt number that we could pass through to here if its set, and otherwise if not set we keep at default of 4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried making it a user-provided parameter, but the method implicitly belongs to the Scenario superclass since it's called in initialize_async and only exposed as self._dataset_config per instance. I think this is a good feature for a future scenario refactor but is out of scope here, so I put it back to 4 for simplicity's sake and for the integration tests.

all_templates = TextJailBreak.get_jailbreak_templates()

if jailbreak_names:
diff = set(jailbreak_names) - set(all_templates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curiosity: whoa my brain doesn't compute this logic lol

is the diff = the names that are in jailbreak_names and not in all_templates

could we make the same comparison by checking for name in jailbreak_names if name not in set(all_templates) raise error and this is just a more efficient way of doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You computed it correctly 🙂 but giving it a second look it was really not readable, so I added a comment that explains how it works. The comparison is the same as the one you described but more efficient

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments